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Because the constant extension ensemble of single chain molecule is not always equivalent with 
constant force ensemble, a model of double-stranded conformations, as in RNA molecules and /3- 
. sheets in proteins, with fixed extension constraint is built in this paper. Based on polymer-graph 

^i!^ ' theory and the self-avoiding walks, sequence dependence and excluded-volume interactions are ex- 

(-H I plicitly taken into account. Using the model, we investigate force-extension curves, contact distri- 

. butions and force-temperature curves at given extensions. We find that, for the same homogeneous 

chains, the force-extension curves are almost consistent with the extension-force curves in the con- 
jugated force ensembles. Especially, the consistence depends on chain lengths. But the curves of 
the two ensembles are completely different from each other if sequences are considered. In addition, 
■ contact distributions of homogeneous sequence show that the double-stranded regions in hairpin 

Q ' conformations tend to locate at two sides of the chain. We contribute the unexpected phenom- 

CZ3 ^ ena to the nonuniformity of excluded-volume interactions of the region and two tails with different 

lengths. This tendency will disappear if the interactions are canceled. Finally, in constant extension 
ensemble, the force-flipping transitions conjugated with re-entering phenomena in constant force 
ensemble are observed in hairpin conformations, while they do not present in secondary structure 
conformations. 



o 



I 



I. INTRODUCTION 



^ ' In recent years, advances in manipulation techniques have made it possible to measure and characterize biological 
[ macromolecules at single-molecule level. By using devices such as optical and magnetic tweezers, atomic force mi- 

2^ ' croscopy, basic mechanical, physical and chemical properties of fundamental biological objects, e.g., proteins, nucleic 
] acids, and molecular motors were obtained^, ^, ||, Q ||]. Special efforts have been devoted to mechanical properties of 
. the nucleic acids, from elastic stretching experiments early Q to recently unzipping double-stranded DNA (dsDNA), 

fSj ' single-stranded DNA (ssDNA) or RNA||, |, Many useful insights can be obtained by analyzing the extension-force 
, curves (EFCs) or force-extension curves (FECs) recorded in experiments, e.g., S-dsDNA structure found[^ |j and the 
' measurement for the proportion of G/C compared to A/T contents along dsDNA sequence[P. 

, On the theoretical side, a number of models have been built to interpret or simulate nucleic acid mechanical 
experimental data and phenomena. Elastic properties of stretched dsDNA were described by Marko and SiggiaQ, and 
Zhou et aZ. [|lO|. Montanari and Mezard revealed that a second order phase transition exhibits as stretching ssDNA| 11 



Lubensky and Nelson presented an extensive theoretical investigation of mechanical un zipp ing dsDNAll3]. Gerland 
Q ' et al. explored quantitatively how secondary structures determine outcome of FECs||l^. Although great efforts 
Q . have been contributed to DNA mechanical problem, we note that little theoretical works concerned about constant 



extension ensembles of ssDNA or RNA|13 , in which two ends separation r of chain is fixed, and the average force 
is measured]^. It is known for a time that in a traditional single polymer system, the constant force and the 
■ constant extension ensembles are equivalent in thermodynamic limit, though for finite systems inequivalence might 
'H , be expected[|l^. For nucleic acid, however, the situation becomes more complex since the presence of monomer 
interactions and the absence of self-averaging arising from sequences [Ol HS]. To study such ensembles further, we will 



modify and extend the statistical model of double-stranded chain conformations developed by Chen and Dill||T^, 1^ to 
fixed extension scenario. To be different from previous theories |l^, the extended model retains a relatively high degree 
of realism, which the sequence dependence and excluded-volume (EV) interactions are took into account explicitly. 
Experiments have shown that sequences change EFCs or FECs dramatically ||, ^ 0, |8|; while very recent stretching 
ssDNA experiment revealed the importance of EV interactions |Q. The interactions were neglected before. On the 
other hand, the model is more "microscopic", i.e., entropy of chain is computed from the number of conformations 
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directly. As the first step, we restrict the model on two dimension (2d) lattice; and the extension is specified to be 
one component of the separation r for simplicity. More general constant r will be given in future work. 

The organization of the paper is as follows. In Sec. ||, the statistical model of double-stranded chain molecules 
is simply reviewed. In Sec. Ill, we extend two classes of conformations, hairpin and RNA-like secondary structure 



to constant extension cases, respectively. In Sec. [V, we first investigate how FECs of two classes of conformations 
are changed with temperature, chain length and sequence. To relate FECs with molecule structures, monomer- 
monomer contact probability distributions are then introduced. As an illustration, the distributions of simpler hairpin 
conformations are calculated. It is unexpected to find that hairpin contacts tend to form at two sides of chain. To 
explore the underlying reason, we introduce the Asymmetric Function ( AF) , and find that the interesting phenomena 
are the results of nonuniform EV interactions of double-stranded regions and two tails with different lengths in hairpin 
conformations. At last part of this section, force-temperature curves (FTCs) at fixed extensions for homogeneous 
double-stranded chains are computed. This study comes from re-entering transitions observed in constant force 
ensembles [p^ pO| . Force flipping phenomena are observed in hairpin conformations, but do not present in secondary 
structure. The "flipping" here is defined that when temperature decreases, force value first decreases and then 
increases; it finally decreases again as temperature is lower than some value. We believe that the phenomena in 
constant extension ensembles are conjugated with re-entering transition. Section ^ is our conclusion. The calculation 
of extensions with a special case in hairpin conformations is relegated in Appendix 



II. THE DOUBLE-STRANDED CHAIN MODEL 



The details of the statistical model of double-stranded chain molecules can be found in Refs. and 0. Here we 
just give a brief review. 



A. Polymer graph theory 

The model is based on polymer graphs, diagrammatic representations of the self-contacts made by different chain 
conformations. Fig. |l| shows a hairpin conformation and corresponding polymer graph: vertices represent the chain 
monomers, straight line links symbolize the covalent bonds, and curved links stand for spatial contact between 
monomers. A given polymer graph represents an ensemble of chain conformations that are consistent with given 




FIG. 1: The polymer graph of hairpin conformations. The shadowed region is a face of the graph. The graph is di- 
vided into two parts: two tails [s,so] and [lo,l], and one double-stranded region, or CG enclosed by outmost link (so,/o). 
Here s and I are two end monomers in chain. The double-stranded region is composed of links nested each other: 

{so,lo),-- ■ (Sj-lJi-l), (s^Ji), {Sj + l,lj + l). 



contacts. Conformations having contacts other than those specified by the polymer graph belong to other graphs. In 
general, fewer curved links exist in a graph, a more larger number of chain conformations are consist with this graph. 
Any two pairs of curved links in a polymer graph must bear one of three relationships: nested, unrelated and crossing 
linked. Graphs of the double-stranded chain conformations involve no crossing links, examples of which include the 
simplest hairpin structure, such as dsDNA, and mainly secondary structures among nucleic acids and antiparallel 
/3-sheets in proteins. In terms of polymer graph, the partition function of a {N +1) monomers ((N-l-l)-mer) chain 
molecule is given as a sum over all possible polymer graphs. 



r 



(1) 
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where ks is Boltzmann constant, T is temperature, F is an index of a possible polymer graph, E(r) and ^(T) are the 
energy and the number of conformations of the given polymer graph F, respectively. To calculate fi(F), Chen and 
Dill developed a matrix multiplication method[|l6|, pjf - 

B. The matrix method for a given polymer graph 

A complex polymer graph can be divided into a series of faces consecutively, in which face is a region in graph that 
is bounded by curved and straight lines and contains no other edges; see the shadowed region in Fig. ^ Faces are 
classified into five types: left (L), middle (M), right (R), left-right (LR) and isolated (I), according to the arrangement 
of the nested curved links that bound the face. The calculation of the full partition function il(F) for a given graph 
F is correspondingly separated into two steps: to count all conformations for each face, as if they are isolated and 
independent of each other, and to assemble these conformations into r2(F). To avoid that conformations of different 
faces bump into each other (EV interaction), more detailed information about the conformations of faces is needed. 
On two dimension (2D) lattice, it is realized by exact enumeration. First, conformations of each face are classified 
into sixteen types according to the shape of ports (inlet and outlet) through which it is connected to other faces. The 
shapes of ports are shown in Fig. ^j. Then the compatibility between neighboring faces can be checked exactly through 
the spatial compatibility between the outlet of one face and the inlet of next face. It is convenient to introduce two 
matrices: the Face Count Matrix (FCM) Sf, which matrix element {St)ij is the number of conformations having an 
inlet conformation of type i (1 < i < 4) and an outlet conformation of type j (1 < i < 4) for a type t face, and 
viability matrix Ytj^t^, which element (Ytjtj)^ is 1 or if connection of type i outlet of a type ti face and type j 
inlet of a type t2 face is viable or not viable. For a hairpin graph F having M faces, r2(F) is derived as a product of 
matrices: 

n{T) = U . S^,, . Yt„t„_, ■ St,,_, ■ • • St, ■ U*, (2) 
where U = {1,1,1,1}, U*is the transpose of U. 




type 1 type 2 type 3 type 4 

FIG. 2: The four types of port (inlet or outlet) shapes on 2D lattice. 



C. The partition function 

In order to calculate the partition function of a whole chain using Eq. |^, the sum over all possible polymer graphs 
is necessary. For the double-stranded conformations, an efficient dynamic programming algorithm was developed in 
Ref. . The idea is to start with a short chain segment and elongate the segment by adding one monomer for each 
step, and to calculate recursively the partition function of the longer segment using the result of the shorter one. The 
algorithms will be discussed as needed in following sections. More useful alternative expression of Eq. |l| is 

Q^(T) = ^5iv(£;)e-^/'=-^, (3) 

E 

where gn^E) is the density of states, or the number of conformations having energy E, which is defined as 



gN{E) = Y.^^^)WT)= 
r 



(4) 



III. CONSTANT EXTENSION ENSEMBLES FOR DOUBLE-STRANDED CHAIN MOLECULES: 
HAIRPIN AND SECONDARY STRUCTURE CONFORMATIONS 



Fig. H depicts the situation studied in this paper. Two ends of chain molecule are grasped by two pins. Instead 
of fixing end-to-end distance (EED)r, its projection along direction Xq, or extension x is required to be constant. 
Average force / along Xq is recorded as a function of x. Because of extension constraint, the partition function Eq. ^ 
is modified to Qn{x;T) 



QN{T;x)^Y.9N{E;x)e-P^, 



(5) 



where g^iE] x) is the number of conformations whose extensions are x. Then force / can be calculated by 

f{x,T)^-kBT^\ogQM{x;T). 



(6) 



Considering that our chain model is on 2D lattice, in following sections constant extension x is to be replaced by a 
discrete variable A. 





FIG. 3: Sketch of the constant extension experiment on the 2D lattice. The larger dark points represent two ends of a chain 
molecule; they are grasped by two pins. Monomers are denoted to be small dark points. In present paper, only projection 
component x of separation r along direction Xo is fixed as constant. Average force / along Xo is recorded as a function of x. 



The appearance of parameter, extension A requires the statistical model of double-stranded chains to be modified 
and extended carefully. In the next two sections, we will show how to compute gj^[E]lS) of hairpin and secondary 
structure conformations, respectively. 



A. Constant extension ensembles of hairpin conformations 



As one of the simplest elements in secondary structure, hairpin conformations exist in a large class of biomolecules, 
such as RNA hairpin, peptide /^-hairpin and DNA hairpin. Recent works showed that the hairpin conformations have 
remarkable thermodynamic and kinetic behaviors [pT| In constant force ensembles, the mechanical behaviors of the 
hairpin conformations are completely different from that of secondary structure confor mat ions |Q. It is interesting to 
see whether there is new difference presented in constant extension ensembles. In addition, more precise formula and 
the essence of theory for complex secondary structure conformations make us to explore their properties independently. 

The polymer graphs of hairpin conformations are that every curved link bears a nested relationship with respect to 
every other curved link. The polymer graph is composed of two parts: two non-self-contacting tail chains (s, so) and 
(Zo,0 0116 double-stranded region (so,^o), which is defined as Closed Graph (CG)[0; see Fig. |l|. The number 
of conformations for a given graph equals a multiplication of the number of double-stranded conformations and the 
number of two tails conformations. In terms of four types of the outermost faces, the graphs are classified into four 
types. (LR type is excluded from hairpin conformations). To sum over all polymer graphs, two matrices, the Closed 
Graph Count Matrix (CGCM), G*t [E, sq, ^o] and diagonal matrix u [s, sq; lo, I] have been defined: (G*t [E, sq: ^o])y" is 
the sum over the number of conformations for all possible t type graphs having energy E, given that the outmost link 
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spans from vertex Iq to vertex sq and innermost and outermost links are in i and j types conformations; diagonal matrix 
element {uj [s, SQ;lo,l])ii is the number of conformations of two tails (s,So) and {lo,l) that are spatially compatible 
with type i conformations. State density g^iE) then can be written as: 



gN{E) = U • ^ a; [s, so; /q, I] ■ G% [E, sq, h] ■ U*, (7) 

here l<t< 4[|^. 




FIG. 4: (A) Sketch of hairpin conformations at constant extension x. Here the double-stranded region contributes +1 to 
extension of whole chain. (B) Illustration of how the conformation ended at (i, y) is distributed to whole lattice plane by eight 
square symmetric transformations, where the shadowed circles represent the double-stranded regions. 



When the extension of hairpin conformations is fixed, giq{E) is then extended to gN{E,/S), and u)[s, sq]Iq,1\ is 
nondegenerated into lo[s, sq\IqtI\IS\. We rewrite Eq. ^ as 

gN{E- A) = U . ^ [s, sq; ^o, ^|A] • G% [E, s^M ■ U*. (8) 

Fig. ^(a) shows that extensions of chain on 2D lattice do not involve any detailed CG structure directly, i.e., only the 
outmost link (sqi^o) contributes ±1 or to A. Though, in real nucleic acid, a typical distance of a hydrogen bond 
is about 3 times than the nucleotide distance, it does not make sense for our description on a coarse-grained level. 
A is enumerated with three steps: first, to fix one conformation of a given CG on lattice and grow two tails which 
are compatible with the graph type; then, to enumerate two tails extensions and combine them with distance of the 
graph according Xq direction; finally, to distribute the conformation to full lattice plane by eight square symmetry 
transformations (Z?4 group); see Fig. ^(b). The basic idea is very simple, however, it is very cumbersome to complete 
this process. Rather than to give a detailed analysis here, the process is illustrated in Appendix |^ by the simplest 
case. 

Although the enumeration method gives quite accurate hairpin extensions, unfortunately, it is impossible to count 
longer chain. We propose a more practical approximation in the next section. 

B. Constant extension ensembles of secondary structure conformations 

Secondary structure can be seen as a tree-like structure into which four basic structure elements, helix, loops, 
bulges and junctions compose through self-similarity arrangement. Each graph of secondary structure conformations 
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e o • > o o • • • • 

a b c d I 

FIG. 5: Illustration of how unrelated closed polymer graphs are reduced into effective chain to calculate the number of 
conformations with fixed extensions. For example, two closed graphs [a, b] and [c, d] in upper are replaced by two bonds in 
below, and length of the effective chain is reduced to {a — s) + {c — b) + {I — d) + 2 from I — s. Considering that EV interactions 
of CGs and single-strand parts make conformations of the neighboring monomers of CGs to be "frozen" , it may be reasonable 
to reduce chain length more, showing here by dashed brackets. In present paper, the simplest case is considered. 




is divided into two levels: the first lever is a combination of unrelated double-stranded regions connected with single- 
stranded chains; the second level is that each CG may be viewed as an independent secondary structure except that 
two end monomers of the region contact. We first simply review how to calculate the state density g{E) without 
mechanical constraint. The necessary definitions are introduced. 

To be different from hairpin conformations, the calculation of full state density of the secondary structure is more 
complex. First all possible CGCMs, G*t [E, a, b] (Note that LR-type faces are included) are computed, where (a, b) is 
the outmost link of the CG. Any CG is composed of smaller unrelated subclosed graphs, auxiliary matrices, Kf [E, l,a,b] 
counting a combination of conformations of all subgraphs are introduced, where cyclelength I is the total number of 
monomers of the single-stranded chains in [a, CGCMs can be obtained by multiplication of matrix Kt[i?, /, a, b] 

and the number of conformations of the single-stranded part. Then in order to combine conformations of unrelated 
graphs into whole, matrices Gt [E,s,a] having energy E for full polymer graph [s,a] are defined, where t = 0,1,2 
represent three full graph types which are classified depending on whether 0, 1 and 2 existing links are connected 
to the rightmost monomer, see Fig. |^. Their element (Gt [E,s,a\)ij is the number of conformations for graphs in 
which the outmost link (of rightmost subgraph) and the innermost link (of the leftmost subgraph) are in j and i 
conformations, respectively. All matrices mentioned above are calculated by dynamics programming algorithm [pT|. 
The full density of states of the secondary structure conformations is written as |jl^ 

2 

gN{E)^V ■J2Gt[E,s,l]-UK (9) 

t=Q 

Consider the number of conformations with fixed extension A. Because that the detailed structures of CGs do 
not affect the whole extensions directly as in hairpin case, CGs are viewed as effective covalent bonds connecting 
left and right parts of a chain; the effective bonds bear the EV interaction of CGs with other units in the chain. 
Thus the secondary structure conformations are identified to be "open" self-avoiding walks (OSAWs) with reduced 
monomers; see Fig.|. OSAWs are self-avoid walks involving no neighboring contact s ||l7| , p^ . The number of OSAWs 
whose extensions are A can be computed by enumeration and extrapolation method p2|. Although the effective chain 
approach (EGA) may overestimate the number of conformations for partially counting EV interactions, we think that 
it is valuable before better methods are found. 

To realize the ECA, we modify the matrices Gf [E,s,a\ to Gt [E,n, s, a], where new parameter n is the number 
monomers of the effective chain. The recursive relations in Ref. (also see Fig. ^) are extended into following 



s 



as as b asb 

FIG. 6: the recursive relations for matrices Gt[E, n, s, a], where t = 0, 1, 2. 



a 



relations: 



Ga[E,n,s,a] = Ga[E,n - 1, s, a - 1] + Gi[E,n - 1, s, a - 1], (10) 
Gi[E,n,s,a] = Sns ^ G*t[E,s,a] 



t=L,MJ 



+ J2 Y.^o[E-E,,n~l,s,b] G*t[E,,b,a] 

0<b<a El t=L,MJ 

+ I] Gi[^- ^1,^-1,5, 5] Y G*t[Ei,b,al (11) 

0<fc<a El t=MJ 

G2[E,n,s,a] = Sn,i ^ G*t[E,s,a] 

t=R,LR. 

+ Y J2 G*t[Ei,b,a] 

0<b<a El t=R,LR 

+ Y ^Gi[S-^i,n-l,s,6]G*fl[£;i,6,a]. (12) 



0<fc<a El 

Correspondingly, gwiE) is extended to gN{E-A) as 

AT 2 

5^(£;;A) = ^C',(n;A)^U.Gt[S,n,s,/]-U* (13) 

where Cx{n; A) is the number of conformations of n-step OSAWs whose final x coordinates are A. Eq. |3| clearly 
separates contribution of the unrelated CGs from the single-stranded parts. 

Because the hairpin is one of four elements in secondary structure case, EGA should be suitable to it. We also note 
that exact enumeration still is available when the lengths of two tails are smaller than some value (27 in this paper), a 
mixture of two methods is applied in hairpin case. It is reasonable, for as the CG size is so large, or tails are so short 
that tail conformations are almost "frozen" , enumeration method is accurate. As the lengths of tails are relatively 
longer than CG size, or the EV interactions of CG and tail are smaller, the ECA may be available. 

IV. TESTS OF THE MODEL: FECS, CONTACT DISTRIBUTIONS AND FORCE FLIPPING 

PHENOMENA 

We have described how to extend the statistical mechanical model of double-stranded chain molecules to constant 
extension ensembles in previous sections. In this section, the model will be used to explore FECs, contact distribution, 
and force flipping phenomena. To account for specific monomer sequences, sequences are divided into two classes: 
homogeneous and RNA-like chain. Each contact of the homogeneous chain contributes one sticking energy — e (e < 0). 
While RNA-like chain has a specific sequence of four types of monomers: A, U, C and G, resembling the 4 types of 
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bases of an RNA; only A-U pair or C-G pair contributes one sticking energy —e. In following, we take the lattice 
spacing b. 

But before beginning with our discussion , the even-odd oscillations in partition function Qn{T;A) with A have 
to be canceled, which are the results of 2D lattice restriction We damp the oscillations out simply by forming 
square root means of partition function as 

Zn{T; A) = [Qn{T; A) * Qn{T; A + . (14) 

Qn{T; A) in Eq. | then is replaced by Zjv(T; A). 



A. Force-Extension curves 



We investigate how temperature T and monomer number {N + 1) affect FECs of homogeneous chains of hairpin 
and secondary structure conformations, respectively; see Figs. ^ Here only positive extensions are given. There 
are common features in these FECs. First, forces / always monotonously increases as extensions x increase except 
the start part. Second, the FECs shapes in monotonous regions of any type of conformations are very similar at 
different N value, i.e., / may be the function of x/N, or relative extension p = x/N. Finally, at the same extension 
interval, the force changes are faster at higher temperature. To check the relationship of force and relative extension 
p, FECs of different N values at the same temperature are plotted together in terms of / versus p. We find that these 
FECs quickly tend to asymptotic curves as N increases; these curves are related only with temperature and chain 
conformation type (results are not shown in this paper). Differences in FECs between two classes conformations are 
also apparent: FECs of hairpin cases are identical over a longer region of extension than those of secondary structure 
conformations; when extensions increase, forces increase continously and smoothly in secondary structure cases, while 
forces in hairpin cases are almost invariant, until dramatical jumps happened as extensions reaching their full length. 

Comparing FECs with EFCs of homogeneous double-stranded chain molecules is valuable. The latter have been 
computed in Ref. ||2^ (see Fig. 8 therein). It is about equivalence of the two conjugated ensembles: the constant 
extension and the constant force [12|, |l^. We find the FECs and EFCs of two ensembles are almost consistent in the 
monotonous regions, though small differences present. The deviation could be expected due to finite TV- value iQ. 

To illustrate effects of specific monomer sequence, FECs of RNA sequences P5ab, PSabcAA, and P5abc are com- 
puted; see Fig. ||(a). These sequences come from force unfolding RNA experiments studied by Liphardt et al. 
recently 0. To be very different from FECs of homogeneous chains, the FECs of heterogeneous chains are no longer 
increase monotonously: sawtooth-like oscillations present in all sequences. The force behaviors are very similar with 
the force curves observed in unzipping dsDNA experiments The complex features disappear at higher temperature, 
such as at 0.6As/kB- We also compare the FECs with the EFCs of force stretched the same RNA sequence : no 
oscillations was observed in constant force ensembles. Unlike the homogeneous chains, the equivalence of two conju- 
gated ensembles is not guaranteed as consideration sequence effects. In particular, the force non-monotonicity even 
survives in thermodynamic limit for each sequence realization since the absence of self- averaging in biomoleculcs [p^ . 

We also calculate FECs of random, alternating and designed sequences; see Fig. ^(b). The designed is composed 
of four elements, A, U, C and G arranged by A - ■ ■ ACCCCU ■ ■ ■ UC ■ ■ ■ CAAAAG ■ ■ ■ G, where the dots represent 
15-mer A, U, C and G consecutively. Comparing the FECs of the artificial sequences with curves got before, we 
find the force curves of the random sequence is similar with the curves of biomolecular sequences in Fig. ||(a); the 
forces of the alternating sequence are almost the same with the results of homogeneous secondary structure chain; 
while the EEC of the designed sequence presents large force reduction at extension about a half of full length. From 
designed sequence arrangement, if no extension constraint, the molecule tends to form two independent identical 
hairpins simultaneously. The larger force reduction reflects completely collapsing of one hairpin. Interestingly, when 
force stretching the designed sequence, only one jump exhibits since higher cooperativity between two hairpins p2|. 



B. Contact distribution and asymmetric function 

In order to gain insights into the molecular structure at any extension, we compute contact distribution p{i, j; T, m) 
for every possible contact pair where m is the discrete value of extension. p(i, j;T,m) is determined by the 

ratio of the conditional partition function ZN{i, j]T,m) for all the conformations that contain contact and full 
partition function ZN{T]m), i.e., p{i,j]T,m) = Zff(i,j]T,m)/ZN{T;m). As an example, the density diagrams of a 
70-mer homogeneous hairpin chain at different extensions are shown in Fig. ^. 

It is unexpected that nonzero contacts populate symmetrically two sides of the catercorner of diagrams when 
extensions are smaller, such as in Figs. |(b) and (c). It means that the CG regions in hairpin conformations tend to 
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FIG. 7: The FECs of homogeneous chains restricted to secondary structure conformations; (a), (c), and hairpin conformations: 
(b), (d). To compare with EFCs of constant force ensembles fed], x-coordinate is set to be force /. 



locate at two sides of the chain. We did not observe the phenomena for the same chain in constant force ensembles 
before. To characterize the phenomena quantitatively, we introduce AF S{L,R), 



maxyL, K) 



(15) 



where L and R are the number of monomers apart from the middle position to the outmost link of CG region; see 
Fig. |l^, 6 is SIGN function, and functions min(L,R) and max(L,R) equal the minimum and the maximum value of 
L and i?, respectively. S values of the CGs located completely on the left (right) of middle position are assumed to 
be +1 (—1), while in conformations containing no contacts, take 5 = 0. 
Then define population probability p(to, s; T) as 



p{m, s; T) = Zjv(s; T, m)/ZN(T; m). 



(16) 



where Zf^{s; T, m) also is the conditional partition function for all the conformations whose AF values are s. Figs. |l|(a) 
and (b) are the distributions of 30-mer homogeneous hairpin chain at temperatures 0.1 and CSe/fc^. We see that the 
p(to, s; T) can be divided into four parts, e.g., in the Fig. |l^(a): at very small extensions (about smaller than 26), the 
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FIG. 8: The FECs of heterogeneous secondary structure chains: (a) the sequences are 49-mer P5ab, 65-mer PSabcAA and 
69-mer P5abc. (b) 70-mer sequences are random, ahernating and designed. Two colors represent FECs at different temperature. 



maximum values of the distribution are at s = 0, i.e, the CGs of the dominant conformations locate on the middle of 
the chain according to the definition of AF; when extensions are larger but still smaller some value (about 156), the 
maximum distribution locate at < ±s < 1. It means that CG regions stand two side of the chain. Interestingly, 
these s values stand on two lines ('X' patterns in figures); if increasing extensions further (smaller than 266), the 
s values of the maximum distributions are ±1; finally, distributions will shift to s = as extensions reach the full 
length. 

The behaviors of the distribution at the first and third parts can be understandable. As extensions are very small or 
even vanish, the favorable conformations of the hairpin are those of most contacts. At lower temperatures, say O.le/fcs, 
for homogeneous chain, such conformations are forming consecutive contacts: (0, N), {1, N —1), ■ ■ ■ , {N/2 — 1, N/2 + 2). 
Hence, the AF value of the maximum distribution is 0. While extensions is larger enough, or temperatures are higher, 
if the number of monomers of the CG is smaller than N/2, the total number of conformations whose CGs present in 
two sides must be far larger than that whose CGs stand in the middle of the chain. According to AF definition, the 
distributions naturally concentrated on s = ±1; this is only combination result. But why in a range of extensions, the 
dominant conformations tend to form at two sides of the chain, or AF is neither nor ±1, even if their CGs across 
over the middle position? Considering that contact interactions along the chain is uniform, energy must not be the 
reason. We believe that the nonuniform EV interactions of the CG regions and two tails with different lengths (see 
Fig. ^ lead to the tendency: under fixed extension and the same number of CG size, the number of conformations 
having two tails with lengths h- and ^2-iner is smaller than the number of conformations having one (^i + Z2)-mer tail; 
or given the same number of tail monomers, the EV effects of CG region are more strong with two tails than with 
one tail. At temperature O.le/fcs, given the extension x, because of the most number of contacts encouraged, then 
the maximum CG size {N — a;)-mer is favorable. If supposing one tail existing at left, one easily give the following 
relation: s w ±2x/N = ±2p. Comparing it with F ig. pT^ (a), the formula agree with the pattern very well: the slopes 
of two lines are about ±2. On the other hand. Fig. |ll|(b) shows that the slopes increase with temperature. In fact, at 
higher temperature T/j, the maximum CG size is unfavorable for thermal fiuctuation. For the same extension, given 
that L and R values at origin temperature are I and r respectively. Supposing t contacts broken at T/j, then the 
s- value changes from 1 — r/l to 1 — {r ~ t)/ {I — t). Apparently, AF is the increasing function of t, while t also be an 
increasing function of temperature. As a comparison, we partially cancel EV interactions: when one length of two 
tails is larger than some value, here 56 is taken, the interactions will be screened. The distribution of 30-mer chain is 
computed again under this the requirement; see Fig. |l^(a). The maximum values of distribution at a fixed extension 
are not located on a unique s, or CGs distribute on the chain uniformly. Because the lines with slopes ±2 represent 
the maximum asymmetry at temperature O.le/fcs, other AF values must stand between them (see "butterfly" pattern 
inFig.Qa)). 

It may provide useful insights into the force stretching problem using asymmetric concept. Define p{ f, s; T) = 
QnH, s;T)/ Qn{T; f), where conditional partition function Qn{S , s\T) is for all conformations whose asymmetric 
factor is s, and Qn{T, F) is full partition function p^. Fig. |l2|(b) is the diagram of p(/, s; 0.1) of 30-mer homogeneous 
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FIG. 9: The density diagrams for contact probability p{i,j;T,m) of a 70-mer homogeneous chain of hairpin conformations at 
different extensions 1,10,30 and 506. Here temperature is O.le/fcs. When extensions are smaller, such as in (b) and (c), nonzero 
contacts populate two sides of, instead of along the catercorner of the diagrams. 




FIG. 10: Sketch of the AF S definition. L and R are the number of monomers apart from the middle position ('x' symbol) of 
the chain to the outmost link of CG region. Here even value of N is considered for simplicity. 
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FIG. 11; The population probabilities p{m,s;T) for 30-mer homogeneous chain of hairpin conformations. Temperatures are 
0.1 and 0.5e/kB- Apparent 'X' patterns can be observed in figures, which shows that CG regions of dominant conformations 
at these extensions tend to form at two sides of the chain. 



temperature 0. 1 

temperature 0. 1 




(a) (b) 

FIG. 12: (a) The population probabilities p{m,s;T) for 30-mer homogeneous chain, where EV interactions between CG regions 
and two tails are considered partially, the 'X' in Fig. |l^ transforms into "butterfiy" . (b) The population probabilities s; T) 
at temperature O.le/fcs, where forces are in unit e/b. The extent of s = ±1 becomes narrow as value of A*' increases, since 
EFCs of the hairpin conformations have first-like behavior. 

hairpin chain. The p{f, s; T) is very different from p{x^ s; T), e.g., the AF values of the former change from to ±1, 
and reenter into in narrow force ranges, whereas for the latter, the changes of the AF are continuous and slowly. At 
temperature O.le/fcs, when forces are smaller than 0.48e/&, states having the most number of contacts are favorable, 
which means most possibility values of AF are 0. Because of higher cooperativity in hairpin conformations, the 
structures are collapsed in small force range as It means AF vanishes again. The temporary s = ±1 

should be the results of finite chain length. 

C. Force flipping plienomena 

Finally, we explore the force dependence on temperatures as fixed extension. The studies are related with the 
re-entering transition studied in unzipping dsDNA(l9[ 1^. The transition means that if one fixes external force at 
a value in a finite range and decreases temperature, beginning with stretched state, dsDNA will first collapse to a 
globular state; while temperature is lowered further, dsDNA will re-enter stretched state again. We have discussed 
the phenomenon in constant force stretched homogeneous chains of hairpin and secondary structure conformations in 
Ref. I p^ : the re-entering transition only presents in hairpin case. To our knowledge, the conjugated phenomena of 
re-entering transitions in constant extension ensembles are not investigated before. Our model can be used to check 
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the problem by numerical computing FTCs of homogeneous chains of hairpin and secondary structure conformations 



respectively. The results of 25-mer chains are shown in Fig. 13: force flipping (the arrow therein) presents at lower 
temperature in hairpin conformations, but is not observed in secondary structure conformations. We believe that the 
flipping just corresponds to re-entering transitions in constant force ensembles. This results could be explained quali- 
tatively. First, the temperatures of flipping and re-entering transitions are almost the same, i.e., about 0.23£/fcg[p2[. 
Second, like the absence of flipping in secondary structures, re-entering transitions of this conformations are not 
observed in constant force ensembles. Finally, because of the equivalence of constant extension and constant force 
ensembles of homogeneous chains, the FTCs at fixed extensions could be derived from the extension-temperature 
curves (ETCs) at fixed force. In the fixed force ensembles, if there is a dip at lower temperature Tr in the ETC of a 
given force, then at least two temperatures located at two sides of correspond to the same extensions Xo- If the 
dip rises as the force increasing, the extension Xo will have a serious of temperatures standing in two sides T^, which 
will form a convex in FTC at Tr- The dips in ETCs of hairpin conformations do have such characteristics ||2^. We 
can use the same approach to understand FTCs behaviors of secondary structure conformations. 




0.5 1 2 2.5 0,5 1 i.5 2.5 

temperature t (e/kg) temperature t (e/kg) 

(a) (b) 

FIG. 13: The TFCs of 25-mer chains of secondary structure (a) and hairpin conformations (b) at different fixed extensions, 
where the sequences are homogeneous. Arrow in (b) points out the convex, which demonstrates the appearance of force-flipping 
transition. 



V. SUMMARY 



In this paper, we propose a statistical model of constant extension ensembles of double-stranded chain molecules. 
Unlike several theoretical models proposed previously, specific monomers sequence and EV interaction are exactly 
included. Using the model, we investigate how FECs depend on chain length, sequence, and structure. In addition, 
the model can relate FECs with molecule structure directly. The structures of hairpin conformations reveal that 
CG regions tend to form at two sides of chains as the extensions are in some range. This unexpected phenomena 
have never been reported before. Through introducing AF and analyzing in detail, we contribute the phenomena to 
EV interactions between CG regions and tails. We also explore the conjugated transitions of re-entering in constant 
extension ensembles, the force-fiipping phenomena, which only present in hairpin conformations. Because our aim in 
this paper is to illustrate, the model is largely simplified, e.g., the chain is restricted on 2D lattice, chain stiffness 
(bending) and elasticity are neglected, and no stacking interaction is involved. However, our results still give many 
important physical information, especially the importance of the long-range EV interaction. In future work, we will 
replace the projection extension by real EEDs. It is interesting to see whether the position tendency of CGs is still 
observed. 
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APPENDIX A: CALCULATING EXTENSIONS OF THE HAIRPIN CONFORMATIONS 
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FIG. 14: One class of conformations for the 2nd tail (/o,0 ^ii<f closed graph (so,Zo) containing outmost type 1 link, where the 
tail type is stiff and extends upward [^. The end I located at {x',y') on frame O' is translated to point {x',y' — 1) on frame 
O, and the correspondineextension A is x' . We can distribute the conformation to lattice plane by eight square symmetry 
transformations (see Fig. k|). The extension As are tabulated respectively, where the character "T" represents transformation 
elements. 



For hairpin conformations, the conformations of tails are determined by two factors: the outermost link type and 
the length of closed graph. To account for EV interactions, the tails have been classified into two types, stiff (s) and 
flex (f)|l^]. Correspondingly, the number of conformations is v}'^{Ni), where ti (s or f) are the type of zth (1st or 2nd) 
tail with length Ni, or A^^-step OS AW. In order to calculate extensions of the whole chain , we introduce additional 
definitions: n^'(A^i;m), the number of conformations for ith OSAW of type ti whose final x coordinates are m; and 
nlj{Ni\m) is similar except y coordinate. It is ease to find n'^'{Ni) = '^^n*-^{Ni;m) = Y^.^n^y{Ni;m). Here we 
illustrate the calculation of the simplest diagonal matrix element a;(/, lo\ sq, s|A)ii, where the outmost link is of type 
1 and s = So, I ^ Iq. Because other more general cases have more cumbersome formula, we are not listed them in the 
present paper. 

There are six classes of possible conformations for the simplest CG-tail complexes [pT| . In Fig. |lj, we show one 
of them. The length of the 2nd tail then is A^2 = I ^ lo- Since the length of CG also affects EV interactions, we 
distinguish different CG sizes. 

(1) lo ~ So — 3. All six classes of conformations are viable. Therefore, 

uil, lo; so, s\A)n = 2 [n{;{N2; A) + n^^N^; -A) + n^^(7V2; A - 1) + n^^(7V2; -(A + 1)) 
+nl^N2;A) + n^^ {N2; -A) + n^^ {N2; A - 1) + ni^{N2; -(A + 1)) 
+2n^n^2; A) + 2nl-{N2; - A) + n^^ (TVs; A - 1) + n^^iVa; -(A + 1)) 
+n;n^2, A + 1) + nl- {N2; -(A - 1)) 

+2 «^ [N, A) + <^ {N, - A) + <^ (A^, A - 1) + n^^ [N, -(A + 1))) 

- (2^0, A + (5l,A + <5-l,A + <^7V2,A + + '^JV2 + 1,A + 6_{N2 + 1).a)] 1 (Al) 



where coefficient 2 is degeneracy degree along projection, the negative part is to eliminate the straight tail conforma- 
tions, which arc counted repeatedly. 
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(2) lo — So > 3. There are five classes of conformations are involved. We have 

toil, lo; so, s|A)n = 2 [nl^{N2; A) + n^- (TVs; -A) + n{f{N2]A - 1) + 7i^^(iV2; -(A + 1)) 
+nlHN2; A) + nmN2; -A) + nl' (iVj; A - 1) + nl' {N2:. -(A + 1)) 
+nmN2; A) + y^^^TVs; -A) + 7i^^(7V2; A - 1) + ^^^(iVa; -(A + 1)) 
+n;=(7V2, A + 1) + (iVa; - (A - 1)) 

+2 (A^, A) + <^ (iV, -A)) + <^ {N, A - 1) + (iV, -(A + 1)) 

- ((5i,A + (5-1, A + Sn2,a + S-n2,a)] ■ (A2) 
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